Skip to content

feat: add GPU inference option to web UI#34

Open
EruditionHerta wants to merge 1 commit intoOpenMOSS:mainfrom
EruditionHerta:feature/gpu-web-inference
Open

feat: add GPU inference option to web UI#34
EruditionHerta wants to merge 1 commit intoOpenMOSS:mainfrom
EruditionHerta:feature/gpu-web-inference

Conversation

@EruditionHerta
Copy link
Copy Markdown

Summary

Allow users to select CUDA device for inference in the web interface

Previously the app forced CPU-only mode regardless of --device flag

Now supports --device cuda, --device auto, and per-request device selection via the web UI dropdown

Changes

main(): Removed forced CPU override, added cuda/auto device resolution with CUDA availability detection

RequestRuntimeManager: Extended normalize_requested_execution_device() whitelist to accept cuda and cuda:N, added _build_cuda_runtime_locked() for dynamic GPU runtime creation

Web UI: Added Device selector dropdown (Default/CPU/CUDA:N) in Generation Options

API endpoints: /api/generate and /api/generate-stream/start accept execution_device parameter, /health reports cuda_available

Test Plan

  • python app.py --device cpu (default behavior unchanged)
  • python app.py --device cuda (GPU mode)
  • python app.py --device auto (auto-detect)
  • Web UI shows Device selector with CUDA options when GPU available
  • Switching between CPU/CUDA in the web UI works correctly
  • Streaming generation works with GPU device

Allow users to select CUDA device for inference in the web interface.
Previously the app was forced to CPU-only mode. Now supports:
- `--device cuda` to start on GPU
- `--device auto` to auto-detect (GPU if available, else CPU)
- Device selector dropdown in the web UI
- Dynamic GPU runtime creation when requested per-request
- `/health` endpoint reports `cuda_available`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant